Learning languages from parallel corpora

نویسندگان

چکیده

This work describes a blueprint for an application that generates language learning exercises from parallel corpora. Word alignment and structures allow the automatic assessment of sentence pairs in source target languages, while users continuously improve quality data with their interactions, thus crowdsourcing material. Through triangulation, can be transferred to other than original ones if multiparallel corpora are used as source. Several challenges need addressed such work, we will discuss three them here. First, question how adequate material identified has received some attention last decade, detail what structure implies selection. Secondly, consider which type generated automatically they foster keep learners motivated. And thirdly, highlight potential employing users, is both teachers learners, crowdsourcers help

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating Multilingual Parallel Corpora in Indian Languages

This paper presents a description of the parallel corpora being created simultaneously in 12 major Indian languages including English under a nationally funded project named Indian Languages Corpora Initiative (ILCI) run through a consortium of institutions across India. The project runs in two phases. The first phase of the project has two distinct goals creating parallel sentence aligned corp...

متن کامل

Parallel corpora for medium density languages

The choice of natural language technology appropriate for a given language is greatly impacted by density (availability of digitally stored material). More than half of the world speaks medium density languages, yet many of the methods appropriate for high or low density languages yield suboptimal results when applied to the medium density case. In this paper we describe a general methodology f...

متن کامل

Extraction of Parallel Corpora from Comparable Corpora

The size and quality of the parallel corpus used for training, greatly impacts the quality of translation of an SMT system. But, there are very few sources of parallel corpora for many language pairs. This is a major hurdle in the development of good SMT systems. To alleviate this problem, comparable or non-parallel corpora, which are largely available, can be exploited to extract parallel data...

متن کامل

Learning Translations of Named-Entity Phrases from Parallel Corpora

We develop a new approach to learning phrase translations from parallel corpora, and show that it performs with very high coverage and accuracy in choosing French translations of English named-entity phrases in a test corpus of software manuals. Analysis of a subset of our results suggests that the method should also perform well on more general phrase translation tasks.

متن کامل

Learning Transfer Rules for Machine Translation from Parallel Corpora

In this paper we present JETCAT, a Japanese-English transfer-based machine translation system. Our main research contribution is that the transfer rules are not handcrafted but are learnt automatically from a parallel corpus. The system has been implemented in Amzi! Prolog, which offers scalability for large rule bases, full Unicode support for Japanese characters, and several APIs for the seam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Slovenš?ina 2.0: Empiri?ne, Aplikativne in Interdisciplinarne Raziskave

سال: 2022

ISSN: ['2335-2736']

DOI: https://doi.org/10.4312/slo2.0.2022.2.101-131